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Abstract. With location-based services becoming increasingly more pop- 
ular, serious concerns are being raised about the potential privacy breaches 
that the disclosure of location information may induce. We consider two 
approaches that have been proposed to limit and control the privacy 
loss: one is the geo-indistinguishability notion of Andres et al., which is 
inspired by differential privacy, and like the latter it is independent from 
the side knowledge of the adversary, and robust with respect to compo- 
sition of attacks. The other one is the mechanism of Shokri et al., which 
offers an optimal trade-off between the loss of quality of service and the 
privacy protection with respect to a given Bayesian adversary. We show 
that it is possible to combine the advantages of the two approaches: 
given a minimum threshold for the degree of geo-indistinguishability, we 
construct a mechanism that offers the maximal utility, as the solution 
of a linear program. Thanks to the fact that geo-indistinguishability is 
insensitive to the remapping of a Bayesian adversary, the mechanism so 
constructed is optimal also in the sense of Shokri et al. Furthermore we 
propose a method to reduce the number of constraints of the linear pro- 
gram from cubic to quadratic (with respect to the number of locations), 
maintaining the privacy guarantees without affecting significantly the 
utility of the generated mechanism. This lowers considerably the time 
required to solve the linear program, thus enlarging significantly the size 
of location sets for which the optimal trade-off mechanisms can still be 
computed. 


1 Introduction 

While location-based systems (LBSs) have demonstrated to provide enormous 
benefits to individuals and society, these benefits come at the cost of users' pri- 
vacy: as discussed in |1|2|3| . location data can be easily linked to a variety of other 
information about an individual, and expose sensitive aspects of her private life 
such as her home address, her political views, her religious practices, etc.. There 
is, therefore, a growing interest in the development of location-privacy protection 
mechanisms (LPPMs), that allow to use LBSs while providing sufficient privacy 
guarantees for the user. Most of the approaches in the literature are based on 


perturbing somehow the information reported to the LBS provider, in order to 
make it difficult for the adversary to infer the user's true location [4I5I6I7I8T5] . 

Clearly, the perturbation of the information sent to the LBS provider leads 
to a degradation of the quality of service, and consequently there is a trade-off 
between the level of privacy that the user wishes to guarantee and the service 
quality loss (SQL) that she will have to accept. The study of this trade-off, and 
the design of mechanisms which optimize it, is an important research direction 
started with the seminal paper of Shroki et al. [8]. 

Obviously, any such study must be based on meaningful notions of privacy 
and SQL. The authors of |8J consider the privacy threats deriving from a Bayesian 
adversary. More specifically, they assume that the adversary knows the prior 
probability distribution on the user's possible locations, and they quantify pri- 
vacy as the expected distance between the true location and the best guess of 
the adversary once she knows the location reported to the LBS. This guess takes 
into account the information already in her possession (the prior probability), 
and it is by definition more accurate, in average, than the reported location. We 
also say that the adversary may remap the reported location. 

The notion of quality loss adopted in [8] is also defined in terms of the ex- 
pected distance between the real location and the reported location, with the 
important difference that the LBS is not assumed to know the user's prior distri- 
bution (in fact, in general it is not tuned for any specific user), and consequently 
it does not apply any remapping. For this reason, when the notion of distance is 
the same, the SQL is always greater than or equal to the privacy. The optimal 
mechanism in [5] is defined as the one which maximizes privacy for a given SQL 
threshold, and since these measures are linear functions of the noise (charac- 
terized by the conditional probabilities of each reported location given a true 
location), such mechanism can be computed by solving a linear optimization 
problem. 

In this paper, we also consider the geo-indistinguishability framework of [9], 
another notion of privacy which is based on differential privacy [TU], and more 
precisely, on the extension of differential privacy to arbitrary metrics proposed in 
Intuitively, a mechanism provides geo-indistinguishability if two locations 
that are geographically close have similar probabilities to generate a certain 
reported location. Equivalently, the reported location will not increase by much 
the adversary's chance to distinguish the true location among the nearby ones. 
Note that this notion protects the accuracy of the location: the adversary is 
allowed to distinguish locations which are far away. It is important to note that 
the property of geo-indistinguishability does not depend on the prior. This is a 
feature inherited from differential privacy, which makes the mechanism robust 
with respect to composition of attacks in the same sense of differential privacy 
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4 This does not mean that the prior knowledge is not harmful for privacy: it is harmful, 
for both geo-indistinguishability and differential privacy, as it is for any obfuscation 
mechanism. But the compositionality guarantees that if the prior knowledge is ac- 
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In this paper we aim at combining the advantages of the two above ap- 
proaches to privacy protection. Namely, we aim at enhancing the optimal mech- 
anism of [Sj, whose optimality and privacy level hold for a specific prior only, 
with privacy guarantees that are independent from the prior. Consider, for in- 
stance, a user for which the optimal mechanism has been computed with respect 
to his average day (and consequent prior p), and who has very different habits 
in the morning and in the afternoon. By simply taking into account the time of 
the day, the adversary can use a different prior, and make a much more effective 
attack, and consequently the privacy guarantees that the mechanism provides 
for p can be violated in an uncontrolled way when the adversary has some ad- 
ditional knowledge. Choosing a mechanism that, besides being optimal for the 
given prior, provides also geo-indistinguishability, would add an additional shield 
against the privacy loss. 

In order to achieve this goal, we fix a lower bound on the level of geo- 
indistinguishability, and we compute a mechanism K with minimum SQL among 
those which respect the bound. The property of geo-indistinguishability is ex- 
pressed by linear constraints, hence we can generate K by solving a linear op- 
timization problem (note that this linear optimization problem is different from 
the one in [8J). Since geo-indistinguishability is not affected by remapping, the 
expected error of the adversary must coincide with the SQL, i.e., the adversary 
cannot gain anything by any remapping H, or otherwise KH would be still geo- 
indistinguishable and provide a better SQL. Since the privacy coincide with the 
SQL, it must be maximum. In conclusion, we obtain a geo-indistinguishable K 
with minimum SQL and maximum privacy (for the given SQL). 

Note that the optimal mechanisms are not unique, and ours does not usually 
coincide with the one produced by the algorithm of [8]. In particular the one 
of [8 in general does not provide geo-indistinguishability, while ours does, by 
design. The robustness of the geo-indistinguishability property seems to affect 
favorably also other notions of privacy: We have evaluated the two mechanisms 
with the privacy definition of |8J on some specific real data, and we have observed 
that, while they obviously coincide on the prior for which the mechanism of (8] is 
computed, ours performs significantly better when we consider different priors. 

It is worth noting that the amount of linear constraints required to express 
geo-indistinguishability is, in general, cubic with respect to the number of loca- 
tions considered. We present an approximation technique to reduce considerably 
this number (from cubic to cuadratic), based on the use of a metric induced by 
a spanning graph of the set of locations. This way, instead of considering the 
geo-indistinguishability constraints for every pair of locations, we only consider 
those for every edge in the spanning graph. We also show, based on experimen- 
tal results, that for a reasonably good approximation out approach offers an 
improvement in running time with respect to method of Shokri et al. We must 
note however that the mechanism obtained this way is no longer optimal with re- 
spect to the original metric, but to the metric induced by the graph instead, and 

quired only via differentially private mechanisms, then the loss of privacy is gradual 
and under control. 


therefore the SQL of the mechanism might be higher, although our experiments 
also show that this increase is not significant. 

Contribution The main contributions of this paper are the following: 

— We present a method based on linear optimization to generate a mechanism 
that is geo-indistinguishable and achieves optimal utility. Furthermore the 
mechanism is also optimal with respect to the expected error of the adversary. 

— We evaluate our approach under different priors (generated from real traces) , 
and show that it outperforms the other mechanisms considered. 

— We propose an approximation technique, based on spanning graphs, that can 
be used to reduce the number of constraints of the optimization problem and 
still obtain a geo-indistinguishable mechanism. 

— We measure the impact of the approximation on the utility and the number 
of constraints, and analyze the running time of the whole method, obtaining 
favorable results. 

Plan of the paper The rest of the paper is organized as follows. Next section recall 
some preliminary notions. In Section[3]we illustrate our method to produce a geo- 
indistinguishable and optimal mechanism as the solution of a linear optimization 
problem, and we propose a technique to reduce the number of constraints used 
in the problem. In Section [4] we evaluate our mechanism with respect to respect 
to other ones in the literature. Finally, in Section [5] we discuss related work and 
we conclude. 

2 Preliminaries 

2.1 Location obfuscation, quality loss and adversary's error 

A common way of achieving location privacy is to apply a location obfuscation 
mechanism, that is a probabilistic function K : X — > V(X) where X is the set 
of possible locations, and V{X) denotes the set of probability distributions over 
X . K takes a location x as input, and produces a reported location z which is 
communicated to the service provider. In this paper we generally consider X to 
be finite, in which case K can be represented by a stochastic matrix, where k xz 
is the probability to report z from location x. 

A prior distribution tt £ V(X) on the set of locations can be viewed either 
as modelling the behaviour of the user (the user profile), or as capturing the 
adversary's side information about the user. Given a prior tt and a metric d on 
X 7 the expected distance between the real and the reported location is: 

ExpDist^ , tt, d) = J2x, z K x k xz d(x, z) 

From the user's point of view, we want to quantify the service quality loss 
(SQL) produced by the mechanism K. Given a quality metric dQ on locations, 
such that dq(x, z) measures how much the quality decreases by reporting z when 
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the real location is x (the Euclidean metric e?2 being a typical choice), we can 
naturally define the quality loss as the expected distance between the real and 
the reported location, that is SQL(if, n, cIq) = ExpDist^K, tt, cIq). The SQL 
can also be viewed as the (inverse of the) utility of the mechanism. 

Similarly, we want to quantify the privacy provided by K. A natural ap- 
proach, introduced in |12] is to consider a Bayesian adversary with some prior 
information tt, trying to remap z back to a guessed location x. A remapping 
strategy can be modelled by a stochastic matrix H, where h Z £ is the probability 
to map z to x. Then the privacy of the mechanism can be defined as the expected 
error of an adversary under the best possible remapping: 

AdvError^ , tt, (1a) = min ExpT)ISt(KH, n, (1a) 

H 

Note that the composition KH of K and H is itself a mechanism. Similarly to 
d,Q, the metric cIa{x,x) captures the adversary's loss when he guess x when the 
real location is x. Note that d,Q and d,A can be different, but a natural choise is 
to use the Euclidean distance for both. 

A natural question, then, is to construct a mechanism that achieves optimal 
privacy, given an SQL constraint. 

Definition 1. Given a prior tt, a quality metric cIq, a quality bound q and an 
adversary metric dA, a mechanism K is g-OPTPRlv(7r, d^, do) iff 

1. SQL(K, tt, do) < q, and 

2. for all mechanisms K 1 , SQL(i"T', tt, do) < q implies 

ADVERROR(i4r', TT, dA) < ADVERROR^, TT, dA) 

In other words, a (7-OptPriv mechanism provides the best privacy (expressed in 
terms of AdvError) among all mechanisms with SQL at most q. This problem 
was studied in [8J, providing a method to construct such a mechanism for any 
q, tt, dA, dQ, by solving a properly constructed linear program. 

2.2 Differential privacy 

Differential privacy was originally introduced in the context of statistical databases, 
requiring that a query should produce similar results when applied to adjacent 
databases, i.e. those differing by a single row. The notion of adjacency is re- 
lated to the Hamming metric dh(x, x') defined as the number of rows in which 
x, x' differ. Differential privacy requires that the greater the hamming distance 
between x,x' is, the more distinguishable they are allowed to be. 

This concept can be naturally extended to any set of secrets X , equipped with 
a metric d x [13111] . The distance d x (x,x') expresses the distinguishability level 
between x and x': if the distance is small then the secrets should remain indistin- 
guishable, while secrets far away from each other are allowed to be distinguished 
by the adversary. The metric should be chosen depending on the application at 
hand and the semantics of the privacy notion that we try to achieve. 
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Following the notation of a mechanism is a probabilistic function K : 
X — » V(Z), where Z is a set of reported values (assumed finite for the purposes 
of this paper) . The similarity between probability distributions can be measured 
by the multiplicative distance d-p defined as d-p([ii, ^2) = snp zeZ \hx^\ with 
|ln j^|^y| = 0 if both /^(z), /j 2 (z) are zero and 00 if only one of them is zero. 
In other words, d-p (pi, ^2) is small iff /ii,fi2 assign similar probabilities to each 
value z. 

The generalized variant of differential privacy under the metric d x , called 
d^-privacy, is defined as follows: 

Definition 2. A mechanism K : X V{Z) satisfies d x -privacy iff: 

d v (K{x),K(x')) < d x (x,x') Vx,x'eX 

or equivalently K(x)(z) < e dx ^ x ' x ^K{x')(z) for all x, x' £ X, z e Z. A privacy 
parameter e can also be introduced by scaling the metric d x (note that ed x is 
itself a metric). 

Differential privacy can then be expressed as ed^-privacy. Moreover, different 
metrics give rise to various privacy notions of interest; several examples are given 
in [IT]. 

2.3 Geo-indistinguishability 

In the context of location based systems the secrets X are locations, and we can 
obtain a useful notion of location privacy by naturally using the Euclidean dis- 
tance d 2 , scaled by a security parameter e. The resulting notion of ec^-privacy, 
called e-geo-indistinguishability in jS], requires that a location obfuscation mech- 
anism should produce similar results when applied to locations that are geo- 
graphically close. This prevents the service provider from inferring the user's 
location with accuracy, while allowing him to get approximate information re- 
quired to provide the service. [9] studies this notion in detail, arguing that it 
provides a natural notion of location privacy that is independent from the prior 
information of the adversary. 

Moreover, [S] shows that geo-indistinguishability can be achieved by adding 
noise to the user's location drawn from a 2-dimensional Laplace distribution. 
This can be easily done in polar coordinates by selecting and angle uniformly 
and a radius from a Gamma distribution. If a restricted set of reported locations 
is allowed, then the location produced by the mechanism can be mapped back 
to the closest among the allowed ones. 

Although the Laplace mechanism provides an easy and practical way of 
achieving geo-indistinguishability, independently from any user profile, its utility 
is not always optimal. In the next section we show that by tailoring a mecha- 
nism to a prior corresponding to a specific user profile, we can achieve better 
utility for that prior, while still satisfying geo-indistinguishability, i.e. a privacy 
guarantee independent from the prior. The evaluation results in Section [4] show 
that the optimal mechanism can provide substantial improvements compared to 
the Laplace mechanism. 
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3 Geo-indistinguishable mechanisms of optimal utility 

As discussed in the introduction, our main goal is, given a set of locations X 
with a privacy metric d x (typically the Euclidean distance), a privacy level e, a 
user profile n and a quality metric dq, to find an ecZ^-private mechanism such 
that its SQL is as small as possible. 

We start by describing a set of linear constraints that enforce ed^-privacy, 
which allows to obtain an optimal mechanism as a linear optimization problem. 
However, the number of constraints can be large, making the approach compu- 
tationally demanding as the number of locations increases. As a consequence, we 
propose an approximate solution that replaces d x with the metric induced by a 
spanning graph. We discuss a greedy algorithm to calculate the spanning graph 
and analyze its running time. Finally, we show that, if the quality and adversary 
metrics coincide, then the constructed (exact or approximate) mechanisms also 
provide optimal privacy in terms of AdvError. 


3.1 Constructing an optimal mechanism 

The constructed mechanism is assumed to have as both input and output a 
predetermined finite set of locations X. For instance, X can be constructed by 
dividing the map in a finite number of regions (of arbitrary size and shape) , and 
selecting in A" a representative location for each region. We also assume a prior 
7r over X, representing the probability of the user being at each location at any 
given time. 

Given a privacy metric d x (typically the Euclidean distance) and a privacy 
parameter e, the goal is to construct a ed^-private mechanism K such that 
the service quality loss with respect to a quality metric dQ is minimum. This 
property is formally defined below: 

Definition 3. Given a prior tt, a privacy metric d x , a privacy parameter e and 
a quality metric dq, a mechanism K is ed^-OPTSQL(7r, c?q) iff: 

1. K is ed x -private, and 

2. for all mechanisms K' , if K' is ed x -private then 

SQL(K,T:,d Q ) < SQL(K',TT,d Q ) 

Note that ed^-OPTSQL optimizes SQL given a privacy constraint, while q- 
OptPriv (Definition [lj optimizes privacy, given an SQL constraint. 

In order for K to be ed^-private it should satisfy the following constraints: 

k xz < e* dx ^k x , z x,x',zeX 

Hence, we can construct an optimal mechanism by solving a linear optimization 
problem, minimizing S QL (K, ir, do) while satisfying ed^-privacy: 
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Minimize: 


^ TT x k xz d Q (x,z) 

x,z£X 


Subject to: k xz < e 


ed x (x,x'). 


X, X , Z G X 


x e x 


z£X 


kxz > 0 


x, z e x 


It is easy to see that the mechanism K generated by the previous optimization 
problem is ec? A .-OPTSQL(7r, cJq). 

3.2 A more efficient method using spanners 

In the optimization problem of the previous section, the eei^-privacy definition 
introduces \X\ 3 constraints in the linear program. However, in order to be able 
to manage a large number of locations, we would like to reduce this amount to 
a number in the order of OdA"! 2 ). One possible way to achieve this is to use the 
dual form of the linear program (shown in the appendix) . The dual program has 
as many constraints as variables in the primal program (in this case \X\ 2 ) and 
one variable for each constraint in the primal program (in this case Od^l 3 )). 
Since the primal linear program finds the optimal solution in a finite number 
of steps, it is guaranteed by the strong duality theorem that dual program will 
also do so. However, as shown in Section [472] in practive the dual program does 
not offer a substantial improvement with respect to the primal one (a possible 
explanation being that, although fewer in number, the constrains in the dual 
program are more complex, in the sense that they each involve a larger number 
of variables). 

An alternative approach is to exploit the structure of the metric d x . So far 
we are not making any assumption about d x , and therefore we need to specify 
| X | constraints for each pair of locations x and x' . However, it is worth noting 
that if the distance d x is induced by a weighted graph (i.e. the distance between 
each pair of locations is the weight of a minimum path in a graph) , then we only 
need to consider \X\ constraints for each pair of locations that are adjacent in 
the graph. An example of this is the usual definition of differential privacy: since 
the adjacency relation between databases induces the Hamming distance dh, we 
only need to require the differential privacy constraint for each pair of databases 
that are adjacent in the Hamming graph (i.e. that differ in one individual). 

It might be the case, though, that the metric d x is not induced by any graph 
(other than the complete graph), and consequently the amount of constraints 
remains the same. In fact, this is generally the case for the Euclidean metric. 
Therefore, we consider the case in which d x can be approximated by some graph- 
induced metric. 
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Fig. 1. (a) a division of the map of Paris into a 7 x 5 square grid. The set of locations 
X contains the centers of the regions, (b) A spanner of X with dilation 5 = 1.08. (c) 
Relation between the number of edges and the dilation for the presented spanner. 


If G is an undirected weighted graph, we denote with dc the distance function 
induced by G, i.e. dc(x, x') denotes the weight of a minimum path between the 
nodes x and x' in G. Then, if the set of nodes of G is X and the weight of its 
edges is given by the metric d x , we can approximate d x with do. In this case, 
we say that G is a spanning graph, or a spanner |14I15| . of X . 

Definition 4 (Spanner). A weighted graph G = (X, E), with E C X x X and 
weight function w : E — > K is a spanner of X if 

w{x,x) = d x {x,x) V(i,i') £ £ 

Note that if G is a spanner of X, then 

dc(x,x') > d x {x,x r ) \/x,x' € X 

A main concept in the theory of spanners is that of dilation, also known as 
stretch factor: 

Definition 5 (Dilation). Let G = (X,E) be a spanner of X. The dilation of 
G is calculated as: 

d G (x,x') 

d = max —— 

xytx'ex d x (x, x' ) 

A spanner of X with dilation 5 is also called a <5-spanner of X. 

Informally, a spanner of X with dilation S can be considered as an approxi- 
mation of the metric d x in which distances between nodes are "stretched" by a 
factor of at most S. Spanners are generally used to approximate distances in a 
geographic network without considering the individual distances between each 
pair of nodes. An example of a spanner for a grid in the map can be seen in 
Figure [l] 

If G is a 5-spanncr of X, then we can see that the following property holds: 

dc(x,x') < 8d x (x,x ) \fx,x' € X 
With this property we can state the following proposition: 
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Proposition 1. Let X be a set of locations with metric d x , and let G be a 

S-spanner of X . If a mechanism K for X is ^dQ-private, then K is ed x -private. 

We can then propose a new optimization problem to obtain a ed^-private 
mechanism. If G = (X,E) is a <5-spanner of X, we require not the constraints 
corresponding to ed^-privacy, but those corresponding to jd^-privacy instead, 
that is, \X\ constraints for each edge of G: 

Minimize: ^ Tr x k xz d Q (x, z) 

Subject to: k xz < e i dc( - x > x ">k x , z z G X, (x, x') G E 

k xz = 1 x G X 

k xz > 0 x, z G X 

Since the resulting mechanism is |dc;-private, by Proposition [T] it must also 
be ed^-private. However, the number of constraints in induced by |d(3-privacy 
is now |£'||A'|. Moreover, as discussed in the next secteion, for any S > 1 there is 
an algorithm that generates a <5-spanner with 0(j— j-) edges, which means that, 
fixing S, the total number of constraints of the linear program is OdA"! 2 ). 

It is worth noting that although ed^-privacy is guaranteed, optimality is 
lost: the obtained mechanism is |dG-OPTSQL(7r, c?q) but not necessarily ed x - 
OptSQL(7t, c?q), since the set of jc^-private mechanisms is a subset of the set 
of eci^-private mechanisms. The SQL of the obtained mechanism will now de- 
pend on the dilation S of the spanner: the smaller S is, the closer the SQL of 
the mechanism will be from the optimal one. However, if S is too small then 
the number of edges of the spanner will be large, and therefore the number of 
constraints in the linear program will increase. In fact, when 5 = 1 the mecha- 
nism obtained is also ed A --OPTSQL(7r, g?q) (since da and d x coincide), but the 
amount of constraints is in general 0(|A"| 3 ). In consequence, there is a tradeoff 
between the accuracy of the approximation and the number of constraints in 
linear program. 

3.3 An algorithm to construct a J-spanner 

The previous approach requires to compute a spanner for X . Moreover, given a 
dilation factor 5, we are interested in generating a (5-spanner with a reasonably 
small number of edges. In this section we describe a simple greedy algorithm to 
get a 5-spanner of X, presented in |14J . This procedure (described in Algorithm 
[TJ is a generalization of Kruskal's minimum spanning tree algorithm. 

The idea of the algorithm is the following: we start with a spanner with an 

. In the main loop we consider all possible edges 


empty set of edges (lines 2p 
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Algorithm 1 Algorithm to get a <5-spanner of X 


1: procedure GetSpanner(A', d x , 8) 

2: E~9 

3: G :— (X, E) 

4: for all [x, x') £ (X x X) do > taken in increasing order wrt d x 

5: if da(x,x') > Sd x (x,x') then 

6: E := Eu{(x,x')} 

7: end if 

8: end for 

9: return G 

10: end procedure 


(that is, all pairs of locations) in increasing order with respect to the distance 
function d x (lines [4][8| , and if the weight of a minimum path between the two 
corresponding locations in the current graph is bigger than d times the distance 
between them, we add the edge to the spanner. By construction, at the end of 
the procedure, graph G is a <5-spanner of X. 

A crucial result presented in [T3] is that, in the case where X is a set of points 
in the Euclidean plane, the degree of each node in the generated spanner only 
depends on the dilation factor: 

Theorem 1. Let S > 1. If G is a S-spanner for X C IR 2 , with the Euclidean 
distance d2 as metric, then the degree of each node in the spanner constructed 
by Algorithm^^ is O(j^j). 

This result is useful to estimate the total number of edges in the spanner, 
since our goal is to generate a sparse spanner, i.e. a spanner with 0(|A?|) edges. 

Considering the running time of the algorithm, since the main loop requires 
all pair of regions to be sorted increasingly by distance, we need to perform 
this sorting before the loop. This step takes 0( | A" | 2 log | X\). The main loop 
performs a minimum-path calculation in each step, with \X\ 2 total steps. If we 
use, for instance, Dijkstra's algorithm, each of these operations can be done in 
0(|£7| + |/f | log | A" |). If we select 8 so that the final amount of edges in the spanner 
is linear, i.e. \E\ = 0(\X\), we can conclude that the total running time of the 
main loop is 0(\ X\ 3 log | X\). This turns out to be also the complexity of the 
whole algorithm. 

A common problem in the theory of spanners is the following: given a set of 
points X <ZM? and a maximum amount of edges m, the goal is to find the spanner 
with minimum dilation with at most m edges. This has been proven to be NP- 
Hard ([IS]). In our case, we are interested in the analog of this problem: given a 
maximum tolerable dilation factor 5, we want to find a 5-spanner with minimum 
amount of edges. However, we can see that the first problem can be expressed in 
terms of the second (for instance, with a binary search on the dilation factor), 
which means that the second problems must be at least NP-Hard as well. 
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3.4 AdvError of the obtained mechanism 


As discussed in |2.1| the privacy of a location obfuscation mechanism can be 
expressed in terms of AdvError for an adversary metric cLa- In |5j, the problem 
of optimizing privacy for a given SQL constraint is studied, providing a method 
to obtain a (7-OptPriv(7t, dA, cIq) mechanism for any q, 7T, cZq, dA- 

In our case, we optimize SQL for a given privacy constraint, constructing a 
ed A -OPTSQL(7r, cIq) mechanism. We now show that, if <1q and (1a coincide, the 
mechanism generated by any of the two optimization problems of the previous 
sections is also ^-OptPriv^, g?q, cLq). 

AdvError corresponds to the attacker's remapping H that minimizes its 
expected error with respect to the metric cIa and its prior knowledge n. The 
composition KH of a mechanism and its remapping is itself a mechanism. The 
following result shows that remapping does not violate d^-privacy. 

Lemma 1. Let K be a d x -private mechanism, and let H be a remapping. Then 
KH is d x -private. 

In the case in which the metric dA used by the adversary and the qual- 
ity metric dQ coincide, we can see that no remapping is needed by the adver- 
sary: the mechanism K obtained by solving the optimization problem is d x - 
OptSQL(7t, do), and then, by Lemma [I] no remapping H can decrease the 
SQL: 

SQL(if, 7T, d Q ) < SQL(KH, n, d Q ) MH 

This, in turn, implies that SQL and AdvError coincide for K, and therefore 
K must be q-OPTPRlv(7r, dQ, Jq). 

Theorem 2. // a mechanism K is d x -OptSQL(tt , dQ) then it is also 
q-OPTPRw(ir,d Q ,d Q ) for q = SQL(K, n, dq). 

It is important to note that Theorem [2] holds for any metric d x . This means 
that both mechanisms obtained as result of the optimization problems pre- 
sented in Sections 


3.1 


and 


3.2 


are g-OPTPRiv(7r, Jq, g?q) - since they are ed x - 
OptSQL(7t, c?q) and |(i(3-OPTSQL(7r, c^q) respectively - however for a different 
value of q. In fact, in contrast to the method of [8] in which the quality boundg 
is given as a parameter, our method optimizes the SQL given a privacy bound. 
Hence, the resulting mechanism will be (7-OPTPRlv(7r, dQ, dQ), but for a q that 
is not known in advance (and will depend on the privacy constraint e and the 
dilation factor 5: the higher the value of e (i.e. the higher the privacy), the lower 
q will be. Similarly, for a fixed e, the lower the value of 5 (i.e. the better the 
approximation), the lower the SQL of K. 

Finally, we must remark that this result only holds in the case where the 
metrics dQ,dA coincide. If the metrics differ, e.g. the quality is measured in 
terms of the Euclidean distance (the user is interested in accuracy) but the 
adversary uses the binary distance (he is only interested in the exact location), 
then this property will no longer be true. 
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(a) 


(b) 


Fig. 2. (a) Division of the map of Beijing into regions of size 0.658 x 0.712 km. The 
density of each region represents how often individuals from the dataset visit it. (b) 
The 50 selected regions. These regions are the ones with highest density between the 
whole set of regions. 

4 Evaluation 

Given a set of locations X in the map with a corresponding metric d x , a dilation 
factor 8, a user u with user profile 7r and a privacy constraint e, we are able to get 
a |dG-OPTSQL(7r, cIq) mechanism for a given quality metric cLq. In this section 
we evaluate the location privacy provided by our mechanisms, and compare 
them with the mechanism of Shorki et al. We consider the construction of the 
mechanisms under different user profiles, and we compare the privacy offered by 
them under different prior distributions. 

In order to perform a realistic analysis, we construct the different user profiles 
and prior distributions using the information of several traces from real users, 
collected in the GeoLife GPS Trajectories dataset (|T7], [TB], [HI]) • This dataset 
contains 17621 traces from 182 users, moving mainly in the north-west of Beijing, 
China, in a period of over five years (from April 2007 to August 2012). The traces 
show users performing routinary tasks (like going to and from work), and also 
traveling, shopping, and doing other kinds of entertainment or unusual activities. 
Besides, the traces were logged by users using different means of transportation, 
like walking, public transport or bike. More than 90% of the traces were logged 
in a dense representation, which means that the individual points in the trace 
were reported every 1-5 seconds or every 5-10 meters. 

For the purposes of evaluation, we divide the map of Beijing into a grid 
of regions 0.6583 km wide and 0.7116 km high (Figure [2^i). We measure the 
"popularity" of each region with respect to all the individuals of the dataset as 
follows: 

— For each user, we calculate the number of points in the traces of this user 
that falls in each region. We refer to this number as the rank of each region. 
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We then select the 30 regions with highest ranks for each user (without 
considering regions with a rank of 0). 

— For each region, we assign a score: the number of users that have this regions 
in their highest ranked ones. The 50 selected regions are those with the 
highest scores. 

Figure [2]i shows the division of the map into regions, with the opacity rep- 
resenting the score of each of them, while Figure (2Jd shows the 50 regions with 
highest score. We can see that most of the selected regions are located in the 
south-east of the Haidian district, and all of them are located in the north-west 
of Beijing. We consider the set of locations X to be the centers of the selected 
regions, and the metric d x to be the Euclidean distance between these centers, 
i.e. d x = d 2 . 

4.1 Comparing privacy 

Given a given user u with profile 7r, we are interested in comparing the privacy 
guarantees offered by the different mechanism presented in previous sections, 
under different kinds of prior information that might be available to the attacker. 
Since we need these profiles and prior distributions to reflect accurately the 
distribution of the users over the map, we only consider users with more than 
100 traces logged in the dataset (23% of the individuals in the dataset meet 
this criteria) within a time period of more than one month but less than one 
year (40% of the total number of users). From these traces, and in order to only 
take into account the "recent" behaviour of the user (like in the experiments 
performed in [5]). we just consider those corresponding to the last month in 
which the user logged some activity. 

The evaluation process is performed as follows: we first select randomly a 
user u meeting the criteria described before and generate the user profile 7r of 
this user. This is done as follows: 

— For each region, we consider how many traces include a point inside that 
region (if a trace has several points in the same region, we consider only one 
of them) . 

— We normalize the values and get a probability distribution. 

Then, in order to perform a fair comparison, we construct the mechanisms 
in such a way that their SQL coincide. The first step is to select a privacy level 
e and a dilation 5, and then construct the mechanism described in Section |3.2| 
We will call this mechanism OptSQL. We then set q = SQL(OptSQL, n, d 2 ) 
and construct the mechanism described in [5], fixing the SQL as q. We call this 
mechanism OptPriv. Finally, we compute a discretized version of the Planar 
Laplacian mechanism of Andres et al [9J. under a privacy constraint e', where e' 
is selected such that the SQL of this mechanism is also q. We call this mechanism 
PL, 

Our goal is to compare the location privacy offered by these three mecha- 
nisms. We note, however, that in general location privacy mechanisms do not 
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Fig. 3. Privacy of the mechanisms for two different users and under four different 
priors, (a) Comparison for ui, with e = 0.5, and SQL set at 1.31 km. (b) Comparison 
for wi, with e = 1.07, and SQL set at 1 km. (c) Comparison for U2, with e = 0.5, and 
SQL set at 1.45 km. (d) Comparison for U2, with e = 1.07, and SQL set at 0.98 km. 


satisfy ed^-privacy unless they are specifically designed to do so. Therefore, we 
measure the privacy using the metric AdvError, proposed in [5] and described 
in Section |2.1[ which measures the expected error of the attacker under a given 
prior distribution. Besides, we will compare the privacy offered by these mecha- 
nisms under different prior distributions. These distributions represent the infor- 
mation the attacker might have about the likelihood of the user being in each of 
the considered regions. This prior knowledge might come not only from previous 
traces of the user, but also from different kinds of side-information accessible to 
the attacker. For instance, if the movement patterns of the user in the morning 
are different from those in the night, then the attacker can improve its prior 
distribution by only considering those traces logged in the morning. In this eval- 
uation, we consider four different prior distributions: the one that comes from all 
the traces of the user in the considered month (this prior distribution coincides 
with the user profile) , and the ones derived from the traces in the morning (from 
7am to noon), afternoon (from noon to 7pm) and night (from 7pm to 7am). 

Figure [3] shows the location privacy (in kilometers) offered by the three dif- 
ferent mechanism under the four prior distribution mentioned before, for two 
randomly selected individuals in the dataset, that we will refer as u% and u-i- 
For each user we perform two experiments, differing in the privacy constraint 
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Fig. 4. (a) Relation between SQL and dilation for the four variants of the mechanism 
OptSQL with privacy constraints e = 0.5 and e = 1.07, and constructed under the user 
profiles of ui and U2. (b) Relation between number of constraints in the optimization 
problem and dilation of the spanner for the set of considered locations. The spanner is 
calculated with the greedy algorithm presented in Section [3. 3| 


used to construct the mechanism OptSQL: in the first case, we set e = 0.5 
(Figures |3^i and [3]:), while in the second we set e = 1.07 (Figures and|3ji). We 
can see that in all four cases and for all priors considered, the location privacy 
offered by OptSQL is higher than that of the other mechanisms, with the only 
exception being the all-day prior (which is the one used in the construction of 
the mechanisms) since, as explained in Section 3.4 OptSQL and OptPpjv are 
<7-OptPpjv(7t, d,2, d,2) and therefore offer the same privacy. 

It can also be observed that in some cases, like in Figure [3^, the location 
privacy for a more specific prior distribution, like the morning prior, can be 
higher than the privacy for a more general one, in this case the prior for all 
day. However this has a simple explanation: since the prior used to construct the 
mechanism is different than the one being considered, then the mechanism might 
become less accurate (i.e. it might have a higher SQL) for the more specific prior, 
and as a consequence it could increase the expected error of the attacker. 


4.2 Performance of the approximation algorithm 

We recall from Section [3~2| that if we consider a large number of locations in X, 
then the number of constraints in the linear program might be high. Hence, we 
introduced a method based on a spanning graph G to reduce the total number 
of constraints of the linear program. However, the obtained mechanism might 
no longer be e<i A .-OPTSQL(7r, cIq), and in fact it has in general a higher SQL 
than the optimal one. 

In this section study how this approximation affects the utility of our mecha- 
nism. We consider the construction of the mechanism OptSQL for users U\ and 
U2, under privacy constraints e = 0.5 and e = 1.07, and measure the SQL for 
different values of the dilation 5 of the spanner, ranging from 1 (which means 
that d x and da are the same) to 2. The results can be observed in Figure We 
can see that the increase on the SQL in each case is steady, but the speed of the 
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Table 1. Execution times of our approach for 50 and 75 locations, for different values 
of 8, and using different methods to solve the linear program. 



Primal simplex 

Dual simplex 

Interior-point 

OptPriv 

Primal LP 

Dual LP 

Primal LP 

Dual LP 

Primal LP 

Dual LP 

50 
regions 

8 = 1.0 

57s 

X 

40s 

45s 

49m 20s 

NI 

lm 

8 = 1.1 

46.4s 

5.2 

5.9s 

15.5s 

7.5s 

NI 

8 = 1.2 

4m 37s 

2s 

4s 

X 

2.7s 

NI 

8 = 1.5 

2s 

Is 

2s 

3s 

0.5s 

NI 

8 = 2.0 

Error 

Is 

2s 

2s 

0.5s 

NI 

75 
regions 

8 = 1.0 

X 

X 

29m 26s 

X 

X 

NI 

11m 

8 = 1.1 

X 

Error 

lm 12s 

2m 19s 

55s 

NI 

8 = 1.2 

X 

Error 

42s 

48.4s 

11.7s 

NI 

5 = 1.5 

X 

5m 55s 

19.2s 

X 

2.2s 

NI 

8 = 2.0 

X 

21.8s 

27.2s 

15.5s 

1.7s 

NI 


increase varies from case to case. Recall that in the experiments performed in 
Section |4~T1 the dilation factor was set at S — 1.1. We can see that the increase in 
the SQL goes from 18.6 meters in the best case (u\ with e = 0.5) to 56 meters 
in the worst (112 with e = 1.07), which represent increases of 1.45% and 6.1% 
respectively. Therefore, we can conclude that a reasonably small dilation does 
not produce an important decrease in the utility. 

Now, since the goal of using an approximate distance was to reduce the 
total number of constraints in the optimization problem, we study the relation 
between this number and the dilation of the spanner. Note that the amount 
of constraints is independent from the user profile and the privacy level e, and 
therefore it is enough to consider only one of the variants of OptSQL presented 
before. From the results, shown in Figure [4Jd, it can be seen that the amount of 
constraints decreases sharply from 6=1 to 8 = 1. 1, and more gradually after 
that. For this particular set of locations X . the optimization problem for 5 = 1 
needs to consider 87750 constraints, while with 5 = 1.1 this amount decreases 
to 21150. This represents a reduction of 76% in the number of constraints of the 
linear program. It is safe to say then that a dilation of 1.1 allows an important 
reduction of the number of constraints without affecting to much the SQL of 
the generated mechanism. 

Finally, we measure the running time of the method used to generate OptSQL, 
under different approaches to solve the linear optimization problem. The exper- 
iments were performed in a 2.8 GHz Intel Core i7 MacBook Pro with 8 GB of 
RAM running Mac OS X 10.9.1, and the source code for the method was writ- 
ten in C++, using the routines in the GLPK library for the linear program. We 
compare the performance of three different methods included in the library: the 
simplex method in both its primal and dual form, and the primal-dual interior- 
point method. Besides, we run these methods on both the primal linear program 
presented in Section |3.2| and its dual form, presented in Appendix [B] Since the 
running time depends mainly on the number of locations being considered, in 
the experiments we focus on the case for user u\ and privacy level e = 1.07. The 
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results can be seen in Table [T] Some fields are marked with X, meaning that 
the execution took more than one hour, after which it was stopped. Others are 
marked with "Error", meaning that the execution stopped before one hour with 
an erroij^] A particular case of error happened when running the interior-point 
method on the dual linear program, where all executions ended with a "numerical 
instability" error. From the results we can observe that: 

— The only two methods that behave consistently (that never finish with error, 
and the running time increases when the dilation decreases) are the dual 
simplex and the interior-point methods, both when applied to the primal 
program. 

— From these, the interior-point method performs better in the case of bigger 
dilation, while it does it much worse for very small ones. 

— Somewhat surprisingly, the dual linear program does not offer a significant 
performance improvement, specially when compared with the interior-point 
method. 

In the case of OptPriv, the mechanism is generated using Matlab's linear 
program solver (source code kindly provided by the authors of [8]). We can 
observe that, for dilations of 1.1 and up, our method offers an improvement in 
terms of running time. 

5 Conclusion and related work 

Related work. In the last years, a large number of location-privacy protection 
techniques, diverse both in nature and goals, have been proposed and studied. 
Many of these aim at allowing the user of an LBS to hide his identity from 
the service provider. Several approaches are based in the notion of ^-anonymity 
[20 2 1122] , requiring that the attacker cannot identify a user from at least other 
k — 1 different users. Others are based on the idea of letting the users use 
pseudonyms to interact with the system, and on having regions (mix zones, 
|4 6J), where the users can change their pseudonyms without being traced by 
the system. All these approaches are incomparable with ours, since ours aims at 
hiding the location of the user and not his identity. 

Many approaches to location privacy are based on obfuscating the posi- 
tion of the user. A common technique for this purpose is cloaking [23 24 25 21 , 
which consists in blurring the user's location by reporting a region to the ser- 
vice provider. Another technique is based on adding dummy locations |26|27|5| 
to the request sent to the service provider. In order to preserve privacy, these 
dummy location should be generated in such a way that they look equally likely 
to be the user's real position. Collaborative models were also proposed, where 
privacy is achieved with a peer-to-peer scheme where users avoid querying the 
service provider whenever they can find the requested information among their 

5 The actual error message in this case was: "Error: unable to factorize the basis matrix 
(1). Sorry, basis recovery procedure not implemented yet" 
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peers [28J. Finally, in |29_ a technique to generate optimal mechanisms under 
bandwidth and quality constraints is presented. The obtained mechanisms can 
be based either on dummy locations, cloaking or simple obfuscation. 

Differential Privacy has also been used in the context of location privacy. 
However, it is in general used to protect aggregate location information. For 
instance, [SU] presents a way to statistically simulate the location data from a 
database while providing privacy guarantees. In |31| , a quadtree spatial decompo- 
sition technique is used to achieve differential privacy in a database with location 
patter mining capabilities. Still, there are works that propose the use of differen- 
tial privacy for the purposes of hiding just an individual's location information. 
Dewri |32| proposes a combination of differential privacy and fc-anonymity which 
requires that the distances between the probability distributions corresponding 
to k fixed locations (defined as the anonymity set) should not be greater than 
the privacy parameter e. 

In [33J the authors use the same generalized notion of differential privacy 
used in this paper in order to construct a fair mechanism that produces similar 
reported values for "similar" users. Here, the similarity between users is captured 
by the metric, which is the one used in the generalization. As in this paper, 
the mechanism is obtained by solving an optimization problem. However, no 
technique is used to reduce de number of constraints of the linear program. 

Conclusion In this paper we have developed a method to generate a mechanism 
for location privacy that combines the advantages of the geo-indistinguishability 
privacy guarantee of |H] and the optimal mechanism of [S]. Since linear optimiza- 
tion is computationally demanding, we have provided a technique to reduce the 
total number of constraints in the linear program, based on the use of a spanning 
graph to approximate distances between locations, which allows a huge reduction 
on the number of constraints with only a small decrease in the utility. Finally, we 
have evaluated the proposed approach using traces from real users, and we have 
compared both the privacy and the running time of our mechanism with that of 
[B]. It turns out that our mechanism offers better privacy guarantees when the 
side knowledge of the attacker is different from the distribution used to construct 
the mechanisms. Besides, for a reasonably good approximation factor, we have 
showed that our approach performs much better in terms of running time. 
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A Proofs 

Proposition 1. Let X be a set of locations with metric d x , and let G be a 

S-spanner of X . If a mechanism K for X is ^dQ-private, then K is ed x -private. 

Proof. This proposition is a direct consequence of the property 


and one of the results presented in [11 , which states that if two metrics d x and 
d y are such that d x < d y (point- wise) , then d^-privacy implies d y -privacy. □ 

Lemma 1. Let K be a d x -private mechanism, and let H be a remapping. Then 
KH is d x -private. 

Proof. We know that 


(2013) 


do{x,x') < 5d x (x,x') \fx,x' € X 



Since K is ed^-private, we also know that 


k X z ^ e x x ' ^ k x ' z , Vx, x , z € X 


Therefore, given x,x' € X, it holds that for all x £ X: 



< 


E ed x {x,x')u , U „ 


ed x (x,x') j. . , 


and therefore KH 


□ 
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Theorem 2. If a mechanism K is d x -OptSQL(tt , (1q) then it is also 
gr-OPTPRiv(7r,d Q ,d Q ) for q = SQL(K, n, d Q ). 


Proof. Let dA — dQ. We recall from Section 2.1 that for an arbitrary mechanism 
M, it holds that 


AdvError(M, 7r,dg) = min ExpD 1ST (MH,ir,dQ) 

H 

= min SQL(MH, tt, d Q ) 

which means that 

AdvError(M, tt, d Q ) < SQL(M, tt, d Q ) (1) 
Let K be a c^-OptSQL^, c^q) mechanism. Suppose that 

AdvError(X, tt, d Q ) < SQL(K,TT,dQ) 
This means that there is a remaping H, other than the identity, such that 

SQL(KH, tt, d Q ) < SQL(K, tt, d Q ) 

However, by Lemma [T] we know that KH is also d^-private, and therefore, 
recalling Definition [3j K would not be cZ^-OptSQL^, dq), which is a contra- 
diction. Therefore, we can state that 

AdvError(-FT, tt, d Q ) = SQLfX, tt, d Q ) (2) 

Now, in order to see that K is also g-OPTPRlv(7r, dg, cZq), with q = SQL(K, tt, dQ), 
let K' be such that 

SQL{K',TT,d Q ) <SQL(i^,d Q ) (3) 
According to Definition [T] we need to prove that 

ADVERROR(-fT', TT, dQ) < ADVERROR(if, TT, 0?q) 

And in fact we can see that 


ADvERROR(if' , tt, d Q ) < SQL(K', tt, d Q ) (by (1)) 

<SQUK,TT,d Q ) (by (3)) 

= AdvError(X, tt, d Q ) (by (2)) 

which concludes our proof. □ 
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B Dual form of the optimization problem 


In this section we will show dual form of the optimization problem presented in 
Section 13.21 In order to obtain the dual form we need to consider one variable 
for each of the constraints in the original linear program that are not constraints 
on single variables. We recall that the original linear program is as follows: 

Minimize: n(x)k xz dQ(x, z) 

Subject to: k xz < ei dG{x ^' ] k x , z z £ X, (x,x') £ E (1) 

^k xz = l x £ X (2) 

xGX 

k xz > 0 x, z £ X 


Therefore, for the dual program we will consider two sets of variables: 

— The variables of the form a xx > z , with z £ X, (x,x') £ E, corresponding to 
the constraints in (1). 

— The variables of the form b x , with x £ X , corresponding to the constraints 
in (2). 

The dual linear program is then as follows: 

Maximize: b x 

xex 

Subject to: b x + ^ {ei da( - x ^a X ' XZ ~ a xx - z ) < n(x)d Q (x, z) x,z £ X 

(x,x')eE 

a XX 'z>0 z £ X, (x, x') £ E 
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